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ABSTRACT For nearly 3 decades, listeriologists and immunologists have used mainly three strains of the same serovar (l/2a) to 
analyze the virulence of the bacterial pathogen Listeria monocytogenes. The genomes of two of these strains, EGD-e and 10403S, 
were released in 2001 and 2008, respectively. Here we report the genome sequence of the third reference strain, EGD, and exten- 
sive genomic and phenotypic comparisons of the three strains. Strikingly, EGD-e is genetically highly distinct from EGD (29,016 
single nucleotide polymorphisms [SNPs]) and 10403S (30,296 SNPs), and is more related to serovar l/2c than l/2a strains. We 
also found that while EGD and 10403S strains are genetically very close (317 SNPs), EGD has a point mutation in the transcrip- 
tional regulator PrfA (PrfA"^), leading to constitutive expression of several major virulence genes. We generated an EGD-e PrfA"*^ 
mutant and showed that EGD behaves like this strain in vitrOy with slower growth in broth and higher invasiveness in human 
cells than those of EGD-e and 10403S. In contrast, bacterial counts in blood, liver, and spleen during infection in mice revealed 
that EGD and 10403S are less virulent than EGD-e, which is itself less virulent than EGD-e PrfA"*^. Thus, constitutive expression 
of PrfA-regulated virulence genes does not appear to provide a significant advantage to the EGD strain during infection in vivOy 
highlighting the fact that in vitro invasion assays are not sufficient for evaluating the pathogenic potential of L. monocytogenes 
strains. Together, our results pave the way for deciphering unexplained differences or discrepancies in experiments using differ- 
ent L. monocytogenes strains. 

IMPORTANCE Over the past 3 decades. Listeria has become a model organism for host-pathogen interactions, leading to critical 
discoveries in a broad range of fields, including bacterial gene regulation, cell biology, and bacterial pathophysiology. Scientists 
studying Listeria use primarily three pathogenic strains: EGD, EGD-e, and 10403S. Despite many studies on EGD, it is the only 
one of the three strains whose genome has not been sequenced. Here we report the sequence of its genome and a series of impor- 
tant genomic and phenotypic differences between the three strains, in particular, a critical mutation in EGD's PrfA, the main 
regulator of Listeria virulence. Our results show that the three strains display differences which may play an important role in 
the virulence differences observed between the strains. Our findings will be of critical relevance to listeriologists and immunolo- 
gists who have used or may use Listeria as a tool to study the pathophysiology of listeriosis and immune responses. 
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Listeria monocytogenes is a low-GC-content, Gram-positive, mans by means of contaminated food products. The pathogenic 
rod- shaped bacterium living in a variety of environments, such properties of L. monocytogenes rely on its ability to cross three host 
as soil and decaying vegetation, and can infect animals and hu- barriers (the intestinal, placental, and blood-brain barriers) and 
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also its ability to enter, replicate, and survive in wide range of 
human cell types, such as macrophages, epithelial cells, and endo- 
thelial cells, thanks to an arsenal of virulence factors. More than 50 
virulence factors have been described (1), and the list continu- 
ously expands. 

During the last three decades, L. monocytogenes has emerged as 
a model organism for the study of host-pathogen interactions (2- 
5), leading to critical discoveries in a broad range of fields, includ- 
ing virulence factor regulation, cell biology, bacterial adaptation 
to the host cytosol, and bacterial pathophysiology. In addition, 
since the pioneering studies of Mackaness (6), L. monocytogenes 
has been widely used as a model to study its interaction with pro- 
fessional phagocytes and host T-cell responses. Remarkably, most 
of these discoveries have been made using three L. monocytogenes 
strains. These widely used strains are the 1 0403S, EGD, and EGD-e 
L. monocytogenes strains. The genome of the EGD-e strain was 
sequenced in 2001 (RefSeq accession number NC_003210 [7]). 
The sequence and annotation of the 10403S genome have recently 
been released (NC_0 17544), as have those of several other strains 
(8-12). Currently, NCBFs RefSeq database contains 39 L. mono- 
cytogenes genomes, and this number will probably continue to 
grow exponentially in the coming years. In this context, the un- 
known sequence of the extensively used strain EGD remained a 
gap to fill. 

The EGD strain is from the Trudeau Institute (NCTC7973) 
and derived from the original strain isolated from guinea pigs by 
E. G. D. Murray et al. in 1926 (13). The name Listeria monocyto- 
genes was definitively coined by Pirie (14). Strain EGD was 
brought back to France by Patrick Berche (see reference 15) in 
1982 after a stay at the Trudeau Institute with Robert North. 
Helmuth Hahn also obtained strain EGD from the Trudeau Insti- 
tute and gave it to Trinad Chakraborty in 1986 (see reference 16). 
The two strains from the Trudeau Institute used to be passaged 
through mice to maintain virulence. When the Listeria genome 
sequencing project was initiated, the European consortium chose 
to sequence strain EGD, which was retested for its virulence in 
mice by Trinad Chakraborty and thereafter named EGD-e (where 
"e" stands for "European" [7]). L. monocytogenes 10403S is a 
streptomycin-resistant (83) derivative of 10403 reported to be iso- 
lated from human skin lesions in Bozeman, MT (17). 

The three strains belong to serovar I /2a. The serotyping 
scheme, based on somatic (O) and flagellar (H) antigens, is the 
oldest technique used to differentiate L. monocytogenes strains 
(18) and has enabled classification of L. monocytogenes in three 
main lineages (I, II, and III). A subpopulation of lineage III, lin- 
eage IIIB, is now called lineage IV (19, 20). Strikingly, a phyloge- 
netic study by multilocus sequence typing (MLST) demonstrated 
that despite the fact that EGD-e is of serotype l/2a, it clusters with 
I/2c strains and is distantly related to 10403S and EGD (21). Phe- 
notypic differences among the three strains have in the past been 
observed by listeriologists but not published. However, in differ- 
ent studies that we reported, EGD was used in preference to 
EGD-e because of its higher invasiveness in human cells (22-27). 
Nevertheless, until now, no study has been performed to charac- 
terize in detail the differences between the three Listeria reference 
strains. 

We report here the sequence and the annotation of the genome 
of L. monocytogenes EGD and a genomic and phenotypic compar- 
ison of the three laboratory model strains, EGD, EGD-e, and 
I0403S. A comparison of protein-coding genes and noncoding 



RNAs shows that even if two of the three strains have nearly the 
same name (EGD-e and EGD), they differ, with EGD being closer 
to 10403S and EGD-e being more distant. One major difference is 
a PrfA mutation found in EGD that induces an overexpression of 
the PrfA- regulated genes (PrfA"^), leading to a higher invasiveness 
in cultured cells and a difference in virulence in animal models. 

RESULTS 

Resequencing of the EGD-e genome sequence. Prior to sequenc- 
ing the L. monocytogenes EGD genome, we resequenced the ge- 
nome of strain EGD-e using the Illumina technique. Only five 
differences compared to the published sequence were found 
(Fig. lA), confirming the high quality of the first published se- 
quence (7, 28). As shown in Data Set SI in the supplemental ma- 
terial, four of the five differences are in intergenic regions, where 
no small RNAs (sRNAs) have been identified so far, and only one 
difference induces an amino acid change, i.e., a glycine to a valine, 
in Lmo0247, a hypothetical protein. 

EGD's genome sequence and its comparison to those of 
EGD-e and 10403S. The EGD genome was sequenced by the Illu- 
mina technique, assembled, annotated, and deposited in the 
European Nucleotide Archive (ENA) (accession number 
HG421741). Strain EGD has one chromosome of 2,907,193 bp 
and no plasmid. This genome is of approximately the same size as 
that of strain 10403S (2,903,106 bp). The EGD-e genome (7) is 40 
kb larger, with a total size of 2,944,528 bp (Table 1). 

A single nucleotide polymorphism (SNP) search, using MUM- 
mer (29), comparing all three strains to each other revealed 29,016 
SNPs (Table I ) between EGD and EGD-e (Fig. 1 A and Data Set S 1 ) 
and 30,296 SNPs between 10403S and EGD-e. In contrast, only 
317 SNPs distinguish EGD from 10403S, indicating that EGD and 
10403S are genetically very close. This result is consistent with the 
reported MLST analysis of EGD and 10403S (21), which shows a 
high similarity (Fig. IB) in terms of sequence type (ST) between 
EGD (ST 12) and 10403S (ST 85), with both strains being classi- 
fied in clonal complex 7 (CC7), whereas EGD-e belongs to CC9. 

We found 2,848 open reading frames (ORFs) in EGD, a num- 
ber close to the 2,846 ORFs predicted for EGD-e (7) and 2,814 
ORFs predicted for 10403S (Table 1). To investigate further the 
differences between these ORFs, we performed a bidirectional 
best-hit search (threshold E value, <le— 4). As shown in Fig. 2A, 
more than 95% (2,683) of EGD-e's ORFs are shared by the three 
strains. EGD, EGD-e, and I0403S have exactly the same number 
of rRNAs (18 rRNAs) and tRNAs (67 tRNAs). They also have the 
same low GC content (39%) at the level of the whole genome 
(Table 1). 

Of the 393 ORFs not shared by the three strains, only 8 are 
common to EGD and EGD-e and not present in 10403S (Data 
Set S2). EGD and 10403S share 22 ORFs that are absent in EGD-e 
(Data Set SI), and EGD-e and 10403S share 36 ORFs (Fig. 2A) that 
are not found in EGD. Of these, 30 come from the Al 18 prophage, 
which is integrated into both EGD-e and 10403S, as previously 
described for EGD-e (7, 30). EGD has 135 ORFs which are not 
found in the two other strains (Fig. 2 A and Data Set S2); 52 are 
phage proteins, and 50 are hypothetical proteins for which PAST 
automatic annotation software has found no homolog. 

EGD has a prophage different from that of EGD-e and 
10403S. Since EGD has 52 specific genes encoding putative phage 
proteins, we examined whether EGD had an integrated phage. A 
BLASTN search of each phage gene of the three strains (Fig. SI A) 
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FIG 1 SNPs, synteny, and sequence type analysis of EGD, EGD-e, and 10403S. (A) SNPs among the EGD, 10403S, and EGD-e reference genomes. Purple 
indicates synonymous changes, blue indicates nonsynonymous changes, and black indicates intergenic changes. (B) Minimum spanning tree analysis of 360 
L. monocytogenes strains based on MLST (multilocus sequence typing) data (adapted from reference 21). The EGD-e, EGD, and 10403S strains are highlighted 
in red. (C) Linear synteny view of the three strains. Phage B025 is integrated into EGD in tRNA'^^s. Phage Al 18 is integrated inside the comKgene in EGD-e and 
10403S. ComK is complete in EGD. 
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TABLE 1 General properties of EGD-e, EGD, and 10403S sequences and annotations 
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using 8 sequenced Listeria species phage genomes (31) indicated 
the presence of A118 phage genes in EGD-e and 10403S, inte- 
grated into the competence gene comK (32), and the presence of 
B025 phage genes in EGD (Fig. IC), integrated into the tRNA^^g 
gene. B025 is found in the first half of the EGD genome (between 
LMON_1236 and LMON_1299) (Fig. SIB). A118 is integrated in 
the second half of the EGD-e genome (between lmo2271 and 
lmo2332) and the 10403S genome (between LMRG_01560 and 
LMRG_01510). 

Conservation of sRNAs. In the past decade, noncoding RNAs 
in Listeria have been studied in detail (33-42). One study concerns 
strain 10403S (36), and all the other studies concern the EGD-e 
strain. We compiled a list of small noncoding RNAs (sRNAs) from 
these various publications. Altogether, 305 noncoding RNA ele- 
ments have now been reported in L. monocytogenes, with 155 
sRNAs, 46 cis regulatory elements (cisRegs), and 104 antisense 
RNAs (asRNAs). 

A comparison of EGD-e RNAs by a BLASTN search ( - e 0.001 
-W 4) in EGD and 10403S showed a very high conservation (Data 
Set SI) with regard to protein-coding genes. Regarding sRNAs, 
142 out of the 155 (92%) are common to the three strains 
(Fig. 2B); 100% of cisRegs are conserved, as are 97% of the 
asRNAs. Only 9 sRNAs are found only in EGD-e (Fig. S2A and B). 
The particular case of Rli38 is interesting, as it seems that the 
whole region from lmol097 (which encodes an integrase) up to 
the 5' end of the Rli38 gene, has been integrated in EGD-e up- 
stream from lmolll6 (Fig. S2C). 

Conservation of internalin genes. L. monocytogenes encodes a 
large family of proteins known as internalins, which possess a 
leucine-rich repeat (LRR) -containing domain. Twenty- five mem- 
bers of this family, including several virulence factors, that have 
been classified into three types (Fig. S3A) were described for strain 
EGD-e (43). InlA, the prototype internalin, and InlB promote 
L. monocytogenes internalization into mammalian cells and were 
initially identified in EGD (44-46). We found that InlA and InlB 
show, respectively, 8 and 6 nonsynonymous amino acid differ- 
ences in EGD and 10403S compared to their counterparts in 
EGD-e. With a BLASTP search of the different internalins already 
described in EGD-e, we found 27 internalins present in the differ- 
ent genomes of strains EGD-e, EGD, and 10403S (Fig. 2C and 
Table 2). 

Our new analysis of the EGD-e genome revealed the presence 
of a type IV internalin represented by Lmo0460, a predicted lipo- 
protein (47) containing an atypical LRR domain (Fig. 2D). 
Whether Lmo0460 is a bona fide lipoprotein remains to be con- 
firmed. The lipobox of Lmo0460 is located at the expected dis- 
tance from the N terminus and differs only slightly from the con- 
sensus lipobox, L — 3-S/A — 2-A/G — 1-C + 1. Lmo0460 displays 



a novel type of LRR domain, with 10 repeats diverging from the 
internalin- LRR prototype motif by a longer length (26 instead of 
22 amino acids [43] ) and the presence of an MFXXCX sequence at 
the end of most repeats (Fig. 2D). The unusual Lmo0460 LRR 
repeats with the M-F motif are found in various predicted surface 
proteins (often lipoproteins) from other species, such as Liste- 
ria innocua, Enterococcus faecalis, Lactobacillus plantarum. Myco- 
plasma mycoides, and Helicobacter hepaticus. The functions of 
these proteins are still unknown. A BLASTP search did not reveal 
any homolog of Lmo0460 in EGD and 10403S (Table 2). How- 
ever, the gene is conserved in many L. monocytogenes strains of 
different serovars. 

As previously reported, inlH from EGD-e comprises the 5' end 
of m/C2 and the 3' end of m/D, both found in EGD and 10403S, 
and likely results from a recombination event (Fig. S3B). InlH and 
InlC2 proteins are highly homologous; they have the same LRR 
domain and C-terminal regions that differ by only 13 amino acids 
(Fig. S3C). 

Presence of a Prf A"^ mutation in EGD. Expression of virulence 
genes at the right time and place during infection is critical for the 
outcome of the disease and is thus highly regulated. PrfA is a 
regulator of the major virulence genes (48-50). It belongs to the 
cyclic AMP (cAMP) receptor protein (Crp)/fumarate nitrate re- 
ductase regulator (Fnr) family of bacterial transcription factors. 
PrfA is itself regulated by an RNA thermosensor allowing PrfA- 
regulated genes to be expressed at 37°C (41), the temperature of 
infected hosts. PrfA is also regulated by nutrient availability via a 
short noncoding RNA generated by a riboswitch (40). 

Among the SNPs detected between the different genomes, a 
remarkable one is present in the prfA gene of strain EGD. We 
found 2 amino acid changes in PrfA of strain EGD compared to 
PrfA in EGD-e and 10403S; one glycine is changed into a serine at 
position 145, and one cysteine is changed into a tyrosine at posi- 
tion 229 (Fig. 3A). While the impact of the latter change, located in 
the G a-helix, on PrfA function is not known, the former, located 
in the D a-helix, is well known. It is a PrfA"^ mutation (51). This 
Glyl45Ser mutation is believed to induce a conformational 
change in the PrfA protein, leading to a constitutively active pro- 
tein and overexpression of the virulence locus and of the whole 
PrfA regulon (52). Strikingly, PrfA is the only protein in the whole 
virulence locus with an amino acid sequence in EGD that is dif- 
ferent from that of 10403S. All the other proteins are similar in the 
two strains but show some differences in strain EGD-e (Fig. 3B 
and Data Set S 1 ) . This result predicted that EGD might express the 
PrfA regulon in a way very different from that of EGD-e and 
10403S (see below). 

It is noteworthy that our analysis of NrdD, a class III anaerobic 
ribonucleotide reductase (RNR), in EGD showed that a KITPFE 
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FIG 2 Conservation of ORFs, small RNAs, and internalins in EGD, EGD-e, 
and 10403S. (A) Venn diagram showing the numbers of ORFs common to the 
different strains. A bidirectional best-hit search with an E value score lower 
than le— 4 was used to determine homologies. (B) Venn diagram of the small 
RNAs found in the three strains. The percentage of similarity was calculated 
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TABLE 2 List of 27 internalins found in EGD-e, EGD, and 104035^ 
Internalin (no. of SNPs) for: 
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^ Internalin genes extensively studied. 
^ inlH is in EGD-e; inlC2 is in EGD and 10403S. 
Boldface indicates internalin genes present in one or two strains. 



motif present in strain EGD (Fig. SIC), as well as in 10403S, is 
absent in EGD-e (53), revealing a higher capacity for the first two 
strains to live under anaerobic conditions, including the gastroin- 
testinal tract. 

The PrfA core regulon in EGD is overexpressed. To assess the 
impact of the PrfA"^ mutation in EGD, we constructed a PrfA"^ 
mutant in EGD-e by generating a Glyl45Ser mutation and com- 
pared the phenotypes of the two strains (EGD and EGD-e PrfA"*^) 
to the EGD-e strain in exponential phase, after growth in brain 
heart infusion (BHI) at 37°C. We first performed a whole-genome 
transcriptomic analysis of the resulting EGD-e PrfA"*^ strain using 
our Affymetrix tiling array (35). As EGD-e and EGD share more 
than 95% of their ORFs and sRNAs, our tiling arrays could also be 
used for EGD transcriptomic analysis. We found that in both EGD 
and EGD-e PrfA"*^, the core PrfA regulon (54), which contains the 
whole virulence locus, the inlA-inlB operon, inlC {lmol786), and 
hpt (lmo0838), is overexpressed compared to its expression in the 
reference strain EGD-e (Fig. 3C and Data Set S3), confirming the 
effect of the Glyl45Ser PrfA"^ mutation on the core PrfA regulon. 



Figure Legend Continued 

from BLASTN results. Small RNAs with a percentage lower than 10% were not 
considered conserved. (C) Genomic locations of 27 internalins in the EGD-e, 
EGD, and 10403S genomes (using CGView [81] ). In red are indicated interna- 
lins present only in one or two strains. (D) Lmo0460 amino acid sequence. 
Lmo0460 is a predicted lipoprotein present only in EGD-e which contains an 
atypical leucine-rich repeat (LRR) domain. 
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FIG 3 PrfA"*^ mutation and the overexpression of the PrfA core regulon in 
EGD, (A) Protein sequence aHgnment of PrfA in 43 L. monocytogenes strains. 
The well-known PrfA"*^ mutation G145S (51, 82) is highlighted in red and 
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All these genes have a canonical PrfA box upstream from their 
start codon, which allows the direct binding of PrfA. 

Notably, we found that only 15 genes in the EGD-e PrfA"*^ strain 
(Fig. S4A) are expressed differently from those in EGD-e when 
bacteria are grown to an optical density at 600 nm (OD^qo) of 1-0 
at 37°C. This list includes the 11 genes of the PrfA core regulon. 
Lmo2269 was found to be differently expressed in L. monocyto- 
genes PUprfA"^ versus P14AprfA after growth in BHI (85). The 
three remaining genes, argC (lmol591), argG (lmo2090), and 
lmo0640, have to our knowledge never been described in PrfA 
regulation studies. In EGD, the number of genes expressed differ- 
ently (128 genes) compared to EGD-e under reference conditions 
is much larger (Fig. S4A) than for EGD-e PrfA"^, but also includes 
the core PrfA regulon (54). Overexpression of inlA, MB, inlQ hly, 
and lmo0042 (which is similar to the gene for the Escherichia coli 
DedA protein, an inner membrane protein) was confirmed by 
quantitative reverse transcription-PCR (qRT-PCR) in the three 
strains (Fig. S5). The overproduction of the InlA, LLO, and InlB 
proteins was also confirmed by Western blotting (Fig. 3D). Exam- 
ination of InlA at the bacterial surface (Fig. 3E) by immunofluo- 
rescence assay (55) showed that InlA decorates the bacterial body 
and accumulates at poles in EGD-e PrfA"*^ and EGD. In contrast, in 
EGD-e, InlA is detected at the surface as helical dots, in agreement 
with the results of our previous studies (56). ActA was more highly 
expressed at the bacterial surface in EGD-e PrfA"^ and EGD than in 
EGD-e. Of interest, exposure of ActA on the surface seems to be a 
bistable process, as only half of the cells express it (Fig. 3E). 

We also looked at differently expressed RNAs. Our statistical 
analysis revealed, in total, 27 sRNAs that were expressed in EGD 
and EGD-e PrfA^ differently from in EGD-e (Fig. S4B and S5 and 
Data Set S3). 

Phenotypic effect of the PrfA"^ mutation. The two PrfA"^ 
strains EGD-e PrfA"*^ and EGD grow more slowly in broth than 
strains EGD-e and 10403S. This was confirmed by a colony size 
analysis showing larger colonies for EGD-e than for EGD-e PrfA"^ 
and for 10403S than for EGD on BHI agar plates after 24 h of 
growth (Fig. 4A). This is in accordance with the defect already 
observed in I0403S PrfA^ strains (57). 

We performed classical gentamicin invasion assays (58) using 
strains EGD, 10403S, EGD-e PrfA^, and EGD-e in three different 
cell lines: HeLa (in which entry is InlB dependent), JEG3 (in which 
entry is InlA and InlB dependent), and Raw264 (macrophages). In 
HeLa and JEG3 cells, strains EGD and EGD-e PrfA"^ were more 
invasive than EGD-e and 10403S (Fig. 4B). There were also higher 
bacterial counts in mouse Raw264 macrophages for EGD-e PrfA"^ 
than for EGD-e and for EGD than for 10403S. 



Figure Legend Continued 

appears only in the EGD and M7 strains. All other amino acid changes found 
are drawn showing their positions in the different domains of PrfA (52). HTH, 
helix turn helix. (B) Schematic representation of the virulence locus synteny in 
EGD-e, EGD, and 10403S. Amino acid differences from EGD-e's sequence are 
displayed. (C) Genome browser view showing tiling array whole- 
transcriptome coverage of the virulence locus and the inlA-inlB operon in 
EGD-e, EGD-e PrfA"^, and EGD. Each tiled probe indicating expression from 
the two genomic strands (top for plus strand, bottom for minus strand) is 
represented as a black dot for EGD-e, an orange dot for EGD-e PrfA"^, and a 
green dot for EGD. (D) Comparison of expression levels of InlA, InlB, and LLO 
in EGD-e, EGD-e PrfA"*^, EGD, and L. innocua Clip 1 1262 (used as a nonpatho- 
genic reference bacterium) in whole bacterial lysates or in the cell wall fraction 
(InlA). (E) Immunofluorescence of InlA and ActA in EGD and EGD-e PrfA'*^ in 
BHI medium. 
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FIG 4 Differential bacterial invasion phenotypes in vitro. (A) Colony sizes of 
the three strains after 24 h of growth on solid BHI agar plates reveal that EGD-e 

(Continued) 



We performed plaque assays in L2 fibroblast cells. We observed 
a larger number of plaques for each PrfA"*^ strain (Fig. 5 A and B). 
Strikingly, plaque size was larger for the 10403S strain than for the 
EGD-e, EGD-e PrfA^, and EGD strains (Fig. 5C). 

We then evaluated the virulence of the different strains in mice. 
Clearly, strains EGD and 10403S are less virulent than EGD-e, as 
revealed by lower bacterial counts in blood, liver, and spleen 
(Fig. 5D), showing that many factors control virulence. In addi- 
tion, the number of EGD-e PrfA"*^ bacteria was higher than the 
number of wild-type EGD-e bacteria in blood, liver, and spleen, in 
agreement with the observed overexpression of the PrfA regulon 
(Fig. 5D). An increased virulence was also observed for 10403S 
PrfA"^ (57). Strikingly, despite a clear phenotypic difference in 
tissue culture cells, bacterial counting in mice spleen and liver 72 h 
after intravenous infection did not demonstrate clear differences 
between EGD and 10403S strains. 

DISCUSSION 

Here we report the genome sequence of L. monocytogenes strain 
EGD and compare it to the genomes of strains EGD-e and 10403S, 
the two other Listeria strains widely used by immunologists and 
listeriologists. Despite the fact that more than 95% of the ORFs are 
conserved in EGD, 10403S, and EGD-e, we found many critical 
differences between these strains. Altogether, our study revealed 
that the EGD strain is closer to 10403S than to EGD-e and that 
EGD-e is quite different from EGD. We detected a PrfA"^ mutation 
in EGD. 

We confirmed the effect of the PrfA"^ mutation on the invasion 
of cells both by invasion assays and by plaque assays, but we did 
not observe an increased virulence of EGD in mice. Finally, the 
plaque size comparison revealed no difference between the 
EGD-e, EGD-e PrfA"^, and EGD strains. However, we detected 
larger plaques for 10403S. These many discrepancies, which can- 
not be explained only by the overexpression of PrfA- regulated 
virulence factors, need more investigation. A first element to de- 
cipher is the complete role of ActA in these phenotypes. ActA is 
known to trigger intra- and intercellular movements (59) and to 
mediate escape from autophagy (60), and it is also implicated in 
interbacterial adhesion during intestinal colonization (61). Here 
were found 27 amino acid changes between EGD-e ActA and the 
ActA proteins of strains EGD and 10403S (Fig. 5C); it is the most 
variable protein within the whole virulence locus (Fig. 3B). The 
highest proportion of amino acid variation is found in the actin 
nucleation motif of ActA (Fig. 5E). A thorough comparative anal- 
ysis of the actin tail lengths and intracellular speeds of the different 
strains may provide insight into the implication of these ActA 
amino acid changes for plaque size differences. 

Altogether, our analysis indicates that PrfA"*^ mutation does not 
confer an advantage during the whole process of Listeria infection 
of cells. We performed a comparison of all PrfA protein sequences 
in 39 published L. monocytogenes genomes. The PrfA"^ mutation 
appears only in the EGD and M7 strains (Fig. 3A). (M7 is a non- 
pathogenic serovar 4a strain isolated from cow's milk [62].) We 

Figure Legend Continued 

PrfA"*^ has smaller colonies than EGD-e and that EGD has smaller colonies than 
10403S. (B) Gentamicin assays at 2 h postinfection of HeLa, JEG3, and Raw264 
cells by the four different strains, EGD, EGD-e PrfA^, 10403S, and EGD. ns, 
not significantly different; ^, P value of <0.05; P value of <0.005; P 
value of <0.0005. 
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FIG 5 Plaque assays, virulence in mice, and ActA amino acid changes. (A) Plaque assays of EGD-e, EGD-e PrfA"*^, 10403S, and EGD at different MOIs show 
different sizes depending on the strain. Highlighted in red are the MOIs used for plaque size measurement. (B) Magnifications (X5.4) of the plaques for EGD-e, 
EGD-e PrfA"*", 10403S, and EGD at an MOI of 0.01. (C) Measurement of plaque size (in square pixels) using Icy image analysis software (80). Different MOIs were 
used for each strain in order to have the same number of plaques in each well. Differences between strains were assessed by unpaired t test. Plaques from 10403S 
are bigger than the ones from EGD-e and the two PrfA"*^ strains (EGD-e PrfA"*^, EGD). (D) CPU counts measured 72 h after intravenous infection with EGD-e, 
EGD-e PrfA"^, 10403S, and EGD. Each dot represents the value for one mouse, and asterisks indicate Mann-Whitney statistical test results; results are from two 
independent experiments. (E) Motifs of the ActA protein (adapted from reference 61 ) and the different amino acid changes between EGD-e and EGD are shown. 
ActA has the same amino acid sequence in EGD and 10403S. The two WASP-like sequences of ActA present no differences between EGD-e, EGD, and 10403S. 
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conclude that the PrfA"^ mutation does not provide an advantage. 
It would otherwise have been found in many more strains. In the 
specific case of the EGD strain, the PrfA"^ mutation might have 
been acquired through years of passage in mice, followed by plat- 
ing on blood agar. 

To understand whether the differences between EGD and 
EGD-e come from a mislabeling or an accumulation of mutations 
during evolution, we searched for the phylogenetic strains closest 
to EGD and EGD-e. According to the NCBI genome database, the 
strain phylogenetically closest to EGD is strain SLCC5850. It is a 
serotype 4b strain isolated in 1983 from a man with meningitis, 
according to the Seeliger collection database (63). Its main feature 
is the loss of important motifs in its PrfA protein (Fig. 3A). The 
loss of these motifs was also found in the nonhemolytic SLCC53 
strain when PrfA was sequenced in 1991 (49). SLCC53 is the type 
strain of Listeria (64) and thus originates from the rabbit strain 
isolated by E. G. D. Murray in 1924 (see reference 13). The strains 
phylogenetically closest to EGD-e are SLCC2372, a serotype l/2c 
strain isolated from human in 1935, and SLCC2479, a serotype 3c 
strain which is of unknown origin, isolated in 1966 (12). By com- 
paring the close phylogenetic neighbors of each strain, it seems 
more likely that EGD is closer to the original type strain and that 
EGD-e has been mislabeled and exchanged for another strain. 
However, a complete answer cannot be given until a phylogenom- 
ics analysis of all Listeria strains available is performed. It would be 
the only solution to characterize the relationship between strains. 
However, we face almost a century of Listeria strain isolation and 
cultures, and it clearly seems impossible to decipher completely 
the many events which might have occurred to create what seems 
to be a mislabeling of strains. 

Since the pioneer work of Mackaness, L. monocytogenes has 
been used and is still widely used as a tool to study the induction of 
a T-cell response as well as to analyze the response to infection in 
knockout mice (65-68). In these studies, infections are performed 
with a variety of L. monocytogenes strains, including the three 
strains EGD, EGD-e, and 10403S. However, strain-specific differ- 
ences are not taken into account except when using mutants, such 
as the nonhemolytic mutant or the ActA mutant strains. We con- 
sider that many factors in addition to LLO and ActA can affect 
survival in the host. It is thus of the utmost importance in any 
report to precisely indicate which strain has been used. It is to be 
noticed that Listeria has recently been engineered as a promising 
live-vaccine strain against viral infection and cancer (69-74). In 
most cases, strain 10403S is the original strain used. Given the 
results reported here, it will be important to use the same original 
strain in future constructions and vaccine trials. In conclusion, 
our results highlight strain -specific genomic differences with im- 
portant consequences for the interpretation of results in both in- 
fection biology and immunology. We hope the genomic compar- 
isons that we provided here will help listeriologists to go further in 
their investigations and strongly recommend that authors always 
indicate the names and origins of the Listeria strains used in their 
studies. 

MATERIALS AND METHODS 

For more information on materials and methods, see Text S4 in the sup- 
plemental material. 

Listeria monocytogenes EGD and EGD-e sequencing and annota- 
tion. Briefly, genomic DNA was prepared as described in reference 75. 
Library preparation was achieved using NEBNext DNA sample prep mas- 



ter mix set 1 with the multiplexing sample preparation oligonucleotide 
according to the manufacturer's recommendations. Libraries were then 
sequenced on a HiSeq 2000 sequencer in 100-base single-end reads. Se- 
quence files were generated using lUumina Analysis Pipeline version 1.7 
(CASAVA). After quality filtering, 25,827,948 reads were aligned with the 
Listeria monocytogenes 10403S genome sequence (GenBank accession 
number CP002002) using CLC Genomics Workbench (version 3.20), and 
more than 98.4% of reads mapped successfully. The remaining 407,195 
reads were then used to sequentially fill gaps in the final sequence. The 
overall final coverage was 875 X, with only 47,125 unmapped reads. 
EGD-e was sequenced using the same protocol, with a total of 13,414,584 
reads. 

The consensus sequence of EGD has been exported and annotated 
using RAST annotation software (76). Automatic annotation provided by 
RAST was curated using homology to proteins in Listeria monocytogenes 
EGD-e (7) and Listeria monocytogenes 10403S, and the sequence was sub- 
mitted to the ENA database (accession number HG421741). Interactive 
visualization of the syntenic organization of Listeria genomes is available 
with the Flash-based SynTView (77) software available at http://genopole 
.pasteur.fr/SynTView/flash/Listeria_monocytogenes/SynWebEGD_final 
.html. 

Transcriptomic analysis. Bacterial overnight cultures were diluted in 
BHI, and bacteria were grown to an OD of 1. RNA was extracted and 
samples for each chip were prepared as previously described (35). The 
tiling chip works with two types of arrays: the gene expression array (link 
E-MTAB- 1 676; https://www.ebi.ac.Uk/arrayexpress/experiments/E 
-MTAB-1676/) and tiling array (link E-MTAB- 1677; https://www.ebi.ac 
.uk/arrayexpress/experiments/E-MTAB-1677/). Genes having an average 
false-discovery rate (Benjamini and Yekutieli method [84]) (FDRBY) 
under 0.05 and an absolute log fold change (|logFC|) value of >1.5 were 
selected as potential differentially expressed genes. For small RNAs, we 
applied the cutoff t test P value of <0.05 and an |logFC| value of >1.5; 
after manual curation, we obtained a potential list of differentially 
expressed sRNAs. Genes and sRNAs of interest were then studied using 
the real-time PGR system ABI PRISM 7900HT (Applied Biosystems), 
normalized to expression of the gyrase (lmo0007) gene, and values were 
compared by an unpaired t test. 

Listeria strains used. For every experiment in this paper, we used the 
following strains: EGD (BUG600), 10403S (BUG1361), EGD-e 
(BUG1600), EGD-e PrfA^ (BUG3057), and L. innocua Clipll262 
(BUG499). BUG numbers are identification numbers of the Unite des 
Interactions Bacteries Cellules laboratory's Listeria strain collection. 

Bacterial lysis, cell wall extraction, and protein detection. For prep- 
aration of whole bacterial lysates, 1X10^ bacteria of overnight cultures 
were washed 3 times in phosphate-buffered saline (PBS), lysed in 200 /xl 
Laemmli buffer containing 10% dithiothreitol (DTT), boiled for 10 min, 
and sonicated. Cell wall extraction and Western blotting of InlA, InlB, 
LLO, and EF-Tu were performed as previously described (78). 

Gentamicin invasion assay and in vivo studies. We performed clas- 
sical gentamicin invasion assays as described in reference 58. Cells were 
plated in 24-well plates the day before infection with the indicated strains 
at a multiplicity of infection (MOI) between 1 and 25 depending on the 
host cell type. Bacteria on BHI agar plates for the inoculum and output 
after gentamicin treatment were counted. Invasion was quantified as a 
percentage of the inoculum. 

All experiments involving mice were handled in accordance with the 
Pasteur Institute's guidelines for animal welfare. Eight- week- old BALB/c 
mice (Charles River) were injected intravenously with 10^ CFU of Listeria 
monocytogenes per mouse. Liver, spleen, and blood samples were recov- 
ered 72 h after infection. The organs were disrupted in 2 ml of PBS. Serial 
dilutions of organ homogenates and of the mouse blood were plated on 
BHI agar plates and numbers of CFU determined. 

Plaque assay. The plaque assay procedure was adapted from the work 
of Kuhn et al. (79). L2 from cells were grown in Ham's F-12K medium 
(GIBCO, Life Technologies). Before the infection, monolayers were in- 
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fected at different MOIs. Infected cells were subsequently incubated at 
37°C for 1 h and then washed several times with medium. Following a 
48 -h incubation at 37°C, cells were fixed with paraformaldehyde (4% in 
PBS for 20 min) and stained with crystal violet. 

In order to measure the plaque size, we needed to have the same num- 
ber of plaques on each well for each bacterium. We thus selected the 
following MOIs: 0.1 for EGD-e, 0.001 for EGD-e PrfA^, 0.5 for 10403S, 
and 0.001 for EGD. Directly on a picture of the plaques, we measured 
plaque size (in square pixels) using the interior value of the region of 
interest (ROI) manually defined by Icy software (80). Almost 30 plaque 
sizes were measured for each bacterium. An unpaired t test was used to 
assess plaque size differences between strains. 

Nucleotide sequence accession number. The sequence of EGD was 
submitted to the ENA database under accession number HG421741. 
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